智能论文笔记

Coevolutionary Framework for Generalized Multimodal Multi-objective Optimization

Wenhua Li , Xingyi Yao , Kaiwen Li , Rui Wang , Tao Zhang

分类：神经与进化计算

2022-12-02

Most multimodal multi-objective evolutionary algorithms (MMEAs) aim to find all global Pareto optimal sets (PSs) for a multimodal multi-objective optimization problem (MMOP). However, in real-world problems, decision makers (DMs) may be also interested in local PSs. Also, searching for both global and local PSs is more general in view of dealing with MMOPs, which can be seen as a generalized MMOP. In addition, the state-of-the-art MMEAs exhibit poor convergence on high-dimension MMOPs. To address the above two issues, in this study, a novel coevolutionary framework termed CoMMEA for multimodal multi-objective optimization is proposed to better obtain both global and local PSs, and simultaneously, to improve the convergence performance in dealing with high-dimension MMOPs. Specifically, the CoMMEA introduces two archives to the search process, and coevolves them simultaneously through effective knowledge transfer. The convergence archive assists the CoMMEA to quickly approaching the Pareto optimal front (PF). The knowledge of the converged solutions is then transferred to the diversity archive which utilizes the local convergence indicator and the $\epsilon$-dominance-based method to obtain global and local PSs effectively. Experimental results show that CoMMEA is competitive compared to seven state-of-the-art MMEAs on fifty-four complex MMOPs.

translated by 谷歌翻译

Consistent Teacher Provides Better Supervision in Semi-supervised Object Detection

Xinjiang Wang , Xingyi Yang , Shilong Zhang , Yijiang Li , Litong Feng , Shijie Fang , Chengqi Lyu , Kai Chen , Wayne Zhang

分类：计算机视觉

2022-09-04

在这项研究中，我们深入研究了半监督对象检测〜（SSOD）所面临的独特挑战。我们观察到当前的探测器通常遭受3个不一致问题。 1）分配不一致，传统的分配策略对标记噪声很敏感。 2）子任务不一致，其中分类和回归预测在同一特征点未对准。 3）时间不一致，伪Bbox在不同的训练步骤中差异很大。这些问题导致学生网络的优化目标不一致，从而恶化了性能并减慢模型收敛性。因此，我们提出了一个系统的解决方案，称为一致的老师，以补救上述挑战。首先，自适应锚分配代替了基于静态的策略，该策略使学生网络能够抵抗嘈杂的psudo bbox。然后，我们通过设计功能比对模块来校准子任务预测。最后，我们采用高斯混合模型（GMM）来动态调整伪盒阈值。一致的老师在各种SSOD评估上提供了新的强大基线。只有10％的带注释的MS-Coco数据，它可以使用Resnet-50骨干实现40.0 MAP，该数据仅使用伪标签，超过了4个地图。当对完全注释的MS-Coco进行其他未标记的数据进行培训时，性能将进一步增加到49.1 MAP。我们的代码将很快开源。

translated by 谷歌翻译

DoF-NeRF: Depth-of-Field Meets Neural Radiance Fields

Zijin Wu , Xingyi Li , Juewen Peng , Hao Lu , Zhiguo Cao , Weicai Zhong

分类：计算机视觉

2022-08-01

神经辐射场（NERF）及其变体在代表3D场景和合成照片现实的小说视角方面取得了巨大成功。但是，它们通常基于针孔摄像头模型，并假设全焦点输入。这限制了它们的适用性，因为从现实世界中捕获的图像通常具有有限的场地（DOF）。为了减轻此问题，我们介绍了DOF-NERF，这是一种新型的神经渲染方法，可以处理浅的DOF输入并可以模拟DOF效应。特别是，它扩展了NERF，以模拟按照几何光学的原理模拟镜头的光圈。这样的物理保证允许DOF-NERF使用不同的焦点配置操作视图。 DOF-NERF受益于显式光圈建模，还可以通过调整虚拟光圈和焦点参数来直接操纵DOF效果。它是插件，可以插入基于NERF的框架中。关于合成和现实世界数据集的实验表明，DOF-NERF不仅在全焦点设置中与NERF相当，而且可以合成以浅DOF输入为条件的全焦点新型视图。还展示了DOF-nerf在DOF渲染上的有趣应用。源代码将在https://github.com/zijinwuzijin/dof-nerf上提供。

translated by 谷歌翻译

Less is More: Consistent Video Depth Estimation with Masked Frames Modeling

Yiran Wang , Zhiyu Pan , Xingyi Li , Zhiguo Cao , Ke Xian , Jianming Zhang

分类：计算机视觉

2022-07-31

时间一致性是视频深度估计的主要挑战。以前的作品基于额外的光流或相机姿势，这是耗时的。相比之下，我们获得了较少信息的一致性。由于固有的视频存在着沉重的时间冗余，因此可以从附近的框架中恢复缺失的框架。受此启发的启发，我们提出了框架屏蔽网络（FMNET），这是一种空间 - 速度变压器网络，可根据其相邻框架预测蒙版框架的深度。通过重建掩盖的时间特征，FMNET可以学习固有的框架间相关性，从而导致一致性。与先前的艺术相比，实验结果表明，我们的方法可以达到可比的空间准确性和更高的时间一致性，而没有任何其他信息。我们的工作为一致的视频深度估计提供了新的视角。

translated by 谷歌翻译

Classifying COVID-19 vaccine narratives

Yue Li , Carolina Scarton , Xingyi Song , Kalina Bontcheva

分类：自然语言处理

2022-07-18

尽管政府的信息运动和谁努力，但Covid-19疫苗犹豫不决是广泛的。其背后的原因之一是疫苗虚假信息在社交媒体中广泛传播。特别是，最近的调查确定，疫苗的虚假信息正在影响COVID-19-19疫苗接种的负面信任。同时，由于大规模的社交媒体，事实检查者正在努力检测和跟踪疫苗虚假信息。为了帮助事实检查员在线监视疫苗叙事，本文研究了一项新的疫苗叙事分类任务，该任务将Covid-19疫苗主张的疫苗索赔分为七个类别之一。遵循数据增强方法，我们首先为这项新的分类任务构建了一个新颖的数据集，重点是少数群体。我们还利用事实检查器注释的数据。该论文还提出了神经疫苗叙事分类器，在交叉验证下达到84％的精度。分类器可公开用于研究人员和记者。

translated by 谷歌翻译

NMS Strikes Back

Jeffrey Ouyang-Zhang , Jang Hyun Cho , Xingyi Zhou , Philipp Krähenbühl

分类：计算机视觉

2022-12-12

Detection Transformer (DETR) directly transforms queries to unique objects by using one-to-one bipartite matching during training and enables end-to-end object detection. Recently, these models have surpassed traditional detectors on COCO with undeniable elegance. However, they differ from traditional detectors in multiple designs, including model architecture and training schedules, and thus the effectiveness of one-to-one matching is not fully understood. In this work, we conduct a strict comparison between the one-to-one Hungarian matching in DETRs and the one-to-many label assignments in traditional detectors with non-maximum supervision (NMS). Surprisingly, we observe one-to-many assignments with NMS consistently outperform standard one-to-one matching under the same setting, with a significant gain of up to 2.5 mAP. Our detector that trains Deformable-DETR with traditional IoU-based label assignment achieved 50.2 COCO mAP within 12 epochs (1x schedule) with ResNet50 backbone, outperforming all existing traditional or transformer-based detectors in this setting. On multiple datasets, schedules, and architectures, we consistently show bipartite matching is unnecessary for performant detection transformers. Furthermore, we attribute the success of detection transformers to their expressive transformer architecture. Code is available at https://github.com/jozhang97/DETA.

translated by 谷歌翻译

Reconstructing Hand-Held Objects from Monocular Video

Di Huang , Xiaopeng Ji , Xingyi He , Jiaming Sun , Tong He , Qing Shuai , Wanli Ouyang , Xiaowei Zhou

分类：计算机视觉

2022-11-30

This paper presents an approach that reconstructs a hand-held object from a monocular video. In contrast to many recent methods that directly predict object geometry by a trained network, the proposed approach does not require any learned prior about the object and is able to recover more accurate and detailed object geometry. The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker. Then, the object geometry can be recovered by solving a multi-view reconstruction problem. We devise an implicit neural representation-based method to solve the reconstruction problem and address the issues of imprecise hand pose estimation, relative hand-object motion, and insufficient geometry optimization for small objects. We also provide a newly collected dataset with 3D ground truth to validate the proposed approach.

translated by 谷歌翻译

Diffusion Probabilistic Model Made Slim

Xingyi Yang , Daquan Zhou , Jiashi Feng , Xinchao Wang

分类：计算机视觉

2022-11-27

Despite the recent visually-pleasing results achieved, the massive computational cost has been a long-standing flaw for diffusion probabilistic models (DPMs), which, in turn, greatly limits their applications on resource-limited platforms. Prior methods towards efficient DPM, however, have largely focused on accelerating the testing yet overlooked their huge complexity and sizes. In this paper, we make a dedicated attempt to lighten DPM while striving to preserve its favourable performance. We start by training a small-sized latent diffusion model (LDM) from scratch, but observe a significant fidelity drop in the synthetic images. Through a thorough assessment, we find that DPM is intrinsically biased against high-frequency generation, and learns to recover different frequency components at different time-steps. These properties make compact networks unable to represent frequency dynamics with accurate high-frequency estimation. Towards this end, we introduce a customized design for slim DPM, which we term as Spectral Diffusion (SD), for light-weight image synthesis. SD incorporates wavelet gating in its architecture to enable frequency dynamic feature extraction at every reverse steps, and conducts spectrum-aware distillation to promote high-frequency recovery by inverse weighting the objective based on spectrum magni tudes. Experimental results demonstrate that, SD achieves 8-18x computational complexity reduction as compared to the latent diffusion models on a series of conditional and unconditional image generation tasks while retaining competitive image fidelity.

translated by 谷歌翻译

Learning with Recoverable Forgetting

Jingwen Ye , Yifang Fu , Jie Song , Xingyi Yang , Songhua Liu , Xin Jin , Mingli Song , Xinchao Wang

分类：计算机视觉

2022-07-17

终身学习旨在学习一系列任务，而无需忘记先前获得的知识。但是，由于隐私或版权原因，涉及的培训数据可能不是终身合法的。例如，在实际情况下，模型所有者可能希望不时启用或禁用特定任务或特定样本的知识。不幸的是，这种灵活的对知识转移的灵活控制在以前的增量或减少学习方法中，即使在问题设定的水平上也被忽略了。在本文中，我们探索了一种新颖的学习方案，称为学习，可回收遗忘（LIRF），该方案明确处理任务或特定于样本的知识去除和恢复。具体而言，LIRF带来了两个创新的方案，即知识存款和撤回，这使用户指定的知识从预先训练的网络中隔离开来，并在必要时将其注入。在知识存款过程中，从目标网络中提取了指定的知识并存储在存款模块中，同时保留了目标网络的不敏感或一般知识，并进一步增强。在知识提取期间，将带走知识添加回目标网络。存款和提取过程仅需在删除数据上对几个时期进行填充时期，从而确保数据和时间效率。我们在几个数据集上进行实验，并证明所提出的LIRF策略具有令人振奋的概括能力。

translated by 谷歌翻译

Factorizing Knowledge in Neural Networks

Xingyi Yang , Jingwen Ye , Xinchao Wang

分类：计算机视觉 | 人工智能 | 机器学习

2022-07-04

在本文中，我们探讨了一项新颖而雄心勃勃的知识转移任务，称为知识分解〜（KF）。 KF的核心思想在于知识的模块化和组装性：鉴于验证的网络模型作为输入，KF旨在将其分解为多个因素网络，每个网络仅处理专用任务，并从源中维护特定于任务的知识，并从源网络。此类因素网络是由任务分开的，可以直接组装，而无需进行任何微调，以产生更有能力的组合任务网络。换句话说，因子网络用作像乐高积木一样的构建块，使我们能够以插件的方式构建自定义网络。具体而言，每个因素网络都包含两个模块，这是一个通用知识模块，该模块是任务无关并由所有因素网络共享的模块，以及一个专门针对因子网络本身的任务特定模块。我们介绍了一个信息理论目标，即Infomax-Bottleneck〜（IMB），以通过优化学习表示和输入之间的相互信息来执行KF。各种基准的实验表明，派生因子网络不仅在专用任务，而且还可以分离，同时享有更好的解释性和模块化。此外，学到的公共知识表示会为转移学习带来令人印象深刻的结果。

translated by 谷歌翻译